现代生成模型大致分为两个主要类别:(1)可以产生高质量随机样品但无法估算新数据点的确切密度的模型,以及(2)提供精确密度估计的模型,以样本为代价潜在空间的质量和紧凑性。在这项工作中,我们提出了LED,这是一种与gan密切相关的新生成模型,不仅允许有效采样,而且允许有效的密度估计。通过最大程度地提高对数可能的歧视器输出,我们得出了一个替代对抗优化目标,鼓励生成的数据多样性。这种表述提供了对几种流行生成模型之间关系的见解。此外,我们构建了一个基于流的生成器,该发电机可以计算生成样品的精确概率,同时允许低维度变量作为输入。我们在各种数据集上的实验结果表明,我们的密度估计器会产生准确的估计值,同时保留了生成的样品质量良好。
translated by 谷歌翻译
虚拟面部化身将在身临其境的沟通,游戏和元视频中发挥越来越重要的作用,因此至关重要的是包容性。这需要准确地恢复出现,无论年龄,性别或种族如何,都以反照率表示。尽管在估计3D面部几何形状方面取得了重大进展,但反照率估计受到较少的关注。该任务在根本上是模棱两可的,因为观察到的颜色是反照率和照明的函数,这两者都是未知的。我们发现,由于(1)偏爱较轻的色素沉着和(2)算法溶液,因此当前的方法偏向浅色肤色,而无视光/反照率的歧义。为了解决这个问题,我们提出了一个新的评估数据集(公平)和算法(Trust),以改善反照率估计以及公平性。具体而言,我们创建了第一个面部反照率评估基准,其中受试者在肤色方面保持平衡,并使用单个类型学角度(ITA)度量测量精度。然后,我们通过建立关键观察结果来解决光/反照率的歧义:与面部的裁剪图像相反,整个场景的图像包含有关照明的重要信息,可用于歧义。信任通过在面部区域和从场景图像中获得的全球照明信号进行调节来回归面部反照率。我们的实验结果表明,就准确性和公平性而言,与最先进的反照率估计方法相比,相比之下。评估基准和代码将用于研究目的,网址为https://trust.is.tue.mpg.de。
translated by 谷歌翻译
传统的变形面模型提供了对表达的细粒度控制,但不能轻易捕获几何和外观细节。神经体积表示方法是光学 - 现实主义,但很难动画,并没有概括到看不见的表达。为了解决这个问题,我们提出了iMavatar(隐式的可变头像),这是一种从单眼视频学习隐含头头像的新方法。灵感来自传统3DMMS提供的细粒度控制机制,我们代表了通过学习的闪打和剥皮领域的表达和与姿势相关的变形。这些属性是姿势独立的,可用于使规范几何形状和纹理字段变成新颖的表达和姿势参数。我们使用射线跟踪和迭代根发现来定位每个像素的规范表面交叉点。关键贡献是我们的新型分析梯度制定,可实现来自视频的imavatars的端到端培训。我们的定量和定性地显示了我们的方法改善了几何形状,并与最先进的方法相比,涵盖了更完整的表达空间。
translated by 谷歌翻译
Numerous works use word embedding-based metrics to quantify societal biases and stereotypes in texts. Recent studies have found that word embeddings can capture semantic similarity but may be affected by word frequency. In this work we study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. We show these behaviors still exist when words are randomly shuffled. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations. The effect is spurious and problematic since bias metrics should depend exclusively on word co-occurrences and not individual word frequencies. Finally, we compare these results with the ones obtained with an alternative metric based on Pointwise Mutual Information. We find that this metric does not show a clear dependence on frequency, even though it is slightly skewed towards male bias across all frequencies.
translated by 谷歌翻译
Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.
translated by 谷歌翻译
We seek methods to model, control, and analyze robot teams performing environmental monitoring tasks. During environmental monitoring, the goal is to have teams of robots collect various data throughout a fixed region for extended periods of time. Standard bottom-up task assignment methods do not scale as the number of robots and task locations increases and require computationally expensive replanning. Alternatively, top-down methods have been used to combat computational complexity, but most have been limited to the analysis of methods which focus on transition times between tasks. In this work, we study a class of nonlinear macroscopic models which we use to control a time-varying distribution of robots performing different tasks throughout an environment. Our proposed ensemble model and control maintains desired time-varying populations of robots by leveraging naturally occurring interactions between robots performing tasks. We validate our approach at multiple fidelity levels including experimental results, suggesting the effectiveness of our approach to perform environmental monitoring.
translated by 谷歌翻译
As demand for large corpora increases with the size of current state-of-the-art language models, using web data as the main part of the pre-training corpus for these models has become a ubiquitous practice. This, in turn, has introduced an important challenge for NLP practitioners, as they are now confronted with the task of developing highly optimized models and pipelines for pre-processing large quantities of textual data, which implies, effectively classifying and filtering multilingual, heterogeneous and noisy data, at web scale. One of the main components of this pre-processing step for the pre-training corpora of large language models, is the removal of adult and harmful content. In this paper we explore different methods for detecting adult and harmful of content in multilingual heterogeneous web data. We first show how traditional methods in harmful content detection, that seemingly perform quite well in small and specialized datasets quickly break down when confronted with heterogeneous noisy web data. We then resort to using a perplexity based approach but with a twist: Instead of using a so-called "clean" corpus to train a small language model and then use perplexity so select the documents with low perplexity, i.e., the documents that resemble this so-called "clean" corpus the most. We train solely with adult and harmful textual data, and then select the documents having a perplexity value above a given threshold. This approach will virtually cluster our documents into two distinct groups, which will greatly facilitate the choice of the threshold for the perplexity and will also allow us to obtain higher precision than with the traditional classification methods for detecting adult and harmful content.
translated by 谷歌翻译
Any strategy used to distribute a robot ensemble over a set of sequential tasks is subject to inaccuracy due to robot-level uncertainties and environmental influences on the robots' behavior. We approach the problem of inaccuracy during task allocation by modeling and controlling the overall ensemble behavior. Our model represents the allocation problem as a stochastic jump process and we regulate the mean and variance of such a process. The main contributions of this paper are: Establishing a structure for the transition rates of the equivalent stochastic jump process and formally showing that this approach leads to decoupled parameters that allow us to adjust the first- and second-order moments of the ensemble distribution over tasks, which gives the flexibility to decrease the variance in the desired final distribution. This allows us to directly shape the impact of uncertainties on the group allocation over tasks. We introduce a detailed procedure to design the gains to achieve the desired mean and show how the additional parameters impact the covariance matrix, which is directly associated with the degree of task allocation precision. Our simulation and experimental results illustrate the successful control of several robot ensembles during task allocation.
translated by 谷歌翻译
Scaling up language models has led to unprecedented performance gains, but little is understood about how the training dynamics change as models get larger. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors? In this paper, we analyze the intermediate training checkpoints of differently sized OPT models (Zhang et al.,2022)--from 125M to 175B parameters--on next-token prediction, sequence-level generation, and downstream tasks. We find that 1) at a given perplexity and independent of model sizes, a similar subset of training tokens see the most significant reduction in loss, with the rest stagnating or showing double-descent behavior; 2) early in training, all models learn to reduce the perplexity of grammatical sequences that contain hallucinations, with small models halting at this suboptimal distribution and larger ones eventually learning to assign these sequences lower probabilities; 3) perplexity is a strong predictor of in-context learning performance on 74 multiple-choice tasks from BIG-Bench, and this holds independent of the model size. Together, these results show that perplexity is more predictive of model behaviors than model size or training computation.
translated by 谷歌翻译
System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering in different fields. Particularly, it is a recurring theme in Reinforcement Learning research, where forward models approximate the state transition function of a Markov Decision Process by learning a mapping function from current state and action to the next state. This problem is commonly defined as a Supervised Learning problem in a direct way. This common approach faces several difficulties due to the inherent complexities of the dynamics to learn, for example, delayed effects, high non-linearity, non-stationarity, partial observability and, more important, error accumulation when using bootstrapped predictions (predictions based on past predictions), over large time horizons. Here we explore the use of Reinforcement Learning in this problem. We elaborate on why and how this problem fits naturally and sound as a Reinforcement Learning problem, and present some experimental results that demonstrate RL is a promising technique to solve these kind of problems.
translated by 谷歌翻译